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ABSTRACT 

We  present  a  data-driven  approach  for  target  detection  and 
identification  based  on  a  linear  mixture  model.  Our  aim  is  to 
determine  the  existence  of  certain  targets  in  a  mixture  without 
specific  information  on  the  targets  or  the  background,  and  to 
identify  the  targets  from  a  given  library.  We  use  the  maximum 
canonical  correlation  between  the  target  set  and  the  observa¬ 
tions  as  the  detection  score,  and  use  coefficients  of  the  canon¬ 
ical  vector  to  identify  the  indices  of  the  present  components 
from  the  given  target  library.  The  performance  of  the  detec¬ 
tor  is  enhanced  using  subspace  partitioning  on  the  target  li¬ 
brary.  Both  simulation  and  experimental  results  are  presented 
to  demonstrate  the  effectiveness  of  the  proposed  method  in 
Raman  spectroscopy  for  detection  of  surface-deposited  chem¬ 
ical  agents. 

Index  Terms —  target  detection,  identification,  canonical 
correlation,  subspace  partitioning,  Raman  spectroscopy 

1.  INTRODUCTION 

The  aim  of  target  detection  in  a  linear  mixture  model  is  to 
determine  if  certain  components  exist  in  a  given  set  of  obser¬ 
vations.  Linear  mixture  model  has  been  widely  used  in  signal 
processing  applications.  It  can  be  represented  as 

X  =  SA  + V, 

where  X  is  an  observation  matrix,  A  a  matrix  of  mixing  co¬ 
efficients,  and  S  the  component  matrix,  and  V  a  noise  matrix. 
Here  X,  S,  and  V  e  and  A  e  where  M 

is  the  number  of  observations,  which  we  assume  is  equal  to 
the  dimension  of  the  signal  subspace,  and  N  is  the  length  of 
observations. 

The  target  in  detection  can  be  a  specific  component.  Ap¬ 
proaches  such  as  generalized  likelihood  ratio  test  (GLRT)  [1], 
and  detection  with  correlation  bound  (DCB)  [2]  have  been 
adopted  and  show  satisfactory  performance  in  these  cases. 
What  we  address  in  this  paper  is  a  more  challenging  prob¬ 
lem,  in  which  the  target  is  a  library  of  components  of  interest, 
i.e.,  the  present  components  are  from  a  given  library  without 
the  knowledge  of  specific  index  information.  This  problem 
can  be  divided  into  two  steps: 
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1 .  Hypothesis  testing: 

Given  X  =  S  A+V  and  spectrum  library  T  =  {ti ,  •  •  •  ,  II}, 
where  L  is  the  number  of  target  components  of  interest, 
determine  whether  one  or  more  components  in  T  exist 
in  the  mixing  components,  i.e., 

TLq  :  iS  n  T  =  (j) 

where  (j)  denotes  the  empty  set. 

2.  Identification: 

If  1,  identify  the  index  of  the  component  that  is  present 
from  the  given  library. 

When  background  is  known,  and  given  that  the  library  is 
ensured  to  cover  all  possible  present  components,  supervised 
approaches  such  as  GLRT  or  linear  regression  methods  can 
be  used  [1]  for  the  problem.  In  practice,  however,  it  is  usually 
difficult  to  obtain  a  reliable  prior  estimate  of  the  background 
components,  and  the  accuracy  and  comprehensiveness  of  the 
component  library  cannot  be  guaranteed,  hence  limiting  the 
utility  of  these  methods. 

The  aim  of  this  paper  is  to  develop  a  data-driven  detection 
method  without  having  to  use  a  priori  information.  In  prac¬ 
tice,  least  squares  (LS)  or  non-negative  least  squares  (NNLS) 
methods  have  been  used  in  applications  such  as  Raman  spec¬ 
troscopy  [1]  where  the  interference  of  background  is  ignored 
in  the  detection.  In  this  paper,  we  use  the  maximum  canon¬ 
ical  correlation  between  the  target  library  and  a  block  of  the 
mixtures  as  the  detection  score,  and  use  the  coefficients  of 
canonical  vector  to  determine  which  components  are  present 
in  the  mixtures.  Hence  both  the  detection  and  the  identifica¬ 
tion  problems  can  be  solved  by  the  approach  at  the  same  time 
using  detection  with  canonical  correlation  analysis  (DCC). 

In  DCC,  we  use  the  target  library  as  a  projection  subspace, 
hence  its  condition  number  is  crucial  to  the  performance  of 
the  detection  algorithm.  High  canonical  correlations  between 
linear  combinations  of  spectra  are  major  causes  for  false  pos¬ 
itives  as  well  as  incorrect  identifications  of  components  that 
are  actually  present. 

In  this  paper,  we  incorporate  DCC  detector  with  a  library 
partitioning  scheme  by  posing  the  problem  as  a  vertex  col¬ 
oring  problem  in  graph  theory,  in  which  linear  combinations 
that  cause  high  canonical  correlations  are  regarded  as  adja¬ 
cent  vertices  with  different  colors,  hence  leading  to  improve¬ 
ment  of  the  performance  by  performing  DCC  on  the  parti¬ 
tioned  subspace  (DCC-P). 
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We  apply  the  proposed  detection  methods  to  Raman  spec¬ 
troscopy.  A  Raman  spectmm  gives  a  set  of  peaks  that  cor¬ 
respond  to  the  characteristic  vibrational  frequencies  of  the 
material,  which  can  be  used  as  a  signature  for  identification 
of  various  materials.  We  consider  the  application  of  Raman 
spectroscopy  on  non-contact  detections  of  surface-deposited 
chemical  agents,  which  is  particularly  useful  for  detecting  en¬ 
vironmentally  hazardous  chemicals  [1].  Both  the  simulation 
and  experimental  results  in  Raman  spectroscopy  demonstrate 
the  effectiveness  of  the  proposed  approach  for  the  problem. 

2.  DETECTION  USING  CANONICAL 
CORRELATION 


Note  that  the  non-zero  element  in  b  indicates  the  index  of 
component  that  is  in  the  mixture  X. 

This  observation  suggests  that  we  can  use  the  maximum 
canonical  correlation, 

p*  =  max  corr  (Xa,  Tb)  (3) 

a,b 

to  solve  the  D-Set  problem  using  the  following  two  steps: 

1.  Use  p*  for  the  hypothesis  test  to  determine  if  any  com¬ 
ponent  in  the  library  is  present, 

2.  Use  canonical  vector,  b*,  to  determine  the  indices  of 
those  that  are  present. 


We  investigate  the  relationship  between  the  observation  data 
set  and  the  target  library  using  canonical  correlation  analysis 
since  it  provides  information  on  the  closeness  of  two  sets  of 
vectors. 

Given  two  sets  of  vectors,  X  =  [xi,  •  •  •  ,  xm]  G 
and  Y  =  [yi,  •  •  •  ,yi,]  G  R^^^,  canonical  correlation  anal¬ 
ysis  seeks  a  pair  of  vectors,  a*  and  b*,  that  maximize  the 
correlation  p  =  corr(Xa,  Yb),  such  that 

p*  =  max  corr  (Xa,Yb).  (1) 

a.b 


The  solution  of  Eq.  (1)  can  be  obtained  by  solving  the 
following  eigenvalue  problems: 


C-iC,,C-JC,,a*  = 

b*  =  C-iC,,a*, 


(2) 


where  C,,  =  E  [xX^]  ,Cyy  =  E  [yY^]  ,C,y  =  E  [xY^] 
and  Cy^  =  E  [yx"^]  . 

The  square  roots  of  the  eigenvalues  obtained  from  Eq.  (2) 
are  called  canonical  correlations,  and  the  vectors  a*  and  b* 
canonical  vectors. 

To  help  explain  the  idea  of  DCC,  we  use  a  noiseless  model 
X  =  SA,  and  let  T  =  [ti,  •  •  •  ,  be  the  target  set.  The 
maximum  canonical  correlation  between  X  and  T  is  given 
by 


p*  =  maxcorr(Xa,  Tb)  =  maxcorr(SAa,  Tb). 

a.b  a.b 

We  can  see  that 

•  Under  Ho  ■ 

S  r\T  =  hence 

p*  =  0 

if  the  subspaces  spanned  by  S  and  T  are  orthogonal. 
Note  that  this  orthogonality  condition  is  just  a  simplifi¬ 
cation  to  emphasize  the  general  idea  for  this  example, 
and  is  not  a  requirement  of  the  DCC  method. 

•  Under  Hi  : 

5  n  T  ^  (f).  Let  Si  =  tj,  i.e.,  S  =  [tj,S2,  •  •  •  ,sm], 
then 

p*  =  1. 


3.  LIBRARY  PARTITIONING  WITH  GRAPH 
COLORING 

As  seen  in  equations  given  in  (2),  the  of  spectrum  library  ma¬ 
trix  T  is  used  in  the  solutions  of  DCC  methods  as  a  projection 
subspace,  hence  its  condition  plays  an  important  role  on  de¬ 
tection  performance.  A  canonical  correlation  value  close  to 
one  implies  that  a  component  in  the  library  is  approximately 
equal  to  a  linear  combination  of  other  components,  as  a  result, 
false  positives  and  incorrect  identifications  might  occur  in  de¬ 
tection.  The  following  are  two  examples  based  on  the  spec¬ 
trum  library  that  we  use  in  our  Raman  spectroscopy  study, 
in  which  there  are  a  total  of  62  spectra,  T  =  [ti,  •  •  •  ,t62], 
where  the  first  50  are  spectra  of  target  chemicals  of  interest, 
and  the  last  12  are  spectra  of  background  materials. 

•  Example  1:  False  positive 

The  canonical  correlation  value  between  154  and  [t27, 123] 
is  close  to  1,  i.e.,  154  «  cd,27  +  I3t2%,  where  a  and  /5  are 
scalars.  Hence  when  154  is  background,  high  detection 
index  is  obtained  if  using  the  whole  spectmm  library 
because  of  the  existence  of  the  targets,  t27  and  t28  in 
the  library. 

•  Example  2:  Misidentification 

Similar  to  the  example  above,  we  have  ti  «  Q:t7  +  /Jtg. 
Hence  when  tg  and  t7  are  the  mixing  chemicals,  ti 
might  be  detected  as  the  present  chemical. 

High  canonical  correlations  also  lead  to  an  ill-conditioned 
component  matrix,  which  is  well  known  as  numerically  un¬ 
stable  and  suffers  from  sensitivity  to  round-off  errors  in  the 
computation. 

Thus  our  objection  to  partition  the  library  by  splitting  chem¬ 
icals  whose  linear  combinations  cause  high  canonical  corre¬ 
lations  by  putting  them  into  different  clusters.  Most  cluster¬ 
ing  algorithms  are  based  on  point-to-point  distance  measures, 
however,  the  canonical  correlation  is  a  measure  on  a  point-to- 
set  basis.  Therefore  clustering  algorithms  are  not  useful  for 
our  library  partitioning  problem. 

To  reduce  canonical  correlations  among  the  spectrum  li¬ 
brary,  we  first  take  a  close  look  at,  for  example,  the  compo¬ 
nent  ti.  The  canonical  correlation  between  ti  and  T\ti  is 
pI  =  0.9983,  and  the  mixing  vector 

bi  =  [•••  ,  0.01,  0.06,  0.43(6),  1.00(7),  0.03,  •••]. 


An  example  solution  is  a*  —  A  ^[1,0,  •••  ,0]^,  and  The  subscripts  denote  the  indices  of  the  coefficients  in  the 
b*  =  [0,  •  •  •  ,  l(j),  •  •  •  ,  0]^  since  SAa*  =  Tb*  =  tj.  mixing  vector,  and  bi  is  normalized  such  that  the  maximum 
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The  inter-cluster  canonical  correlation  for  tj  with  cluster 
i  is  defined  as: 

=  m^corr  (tj,  Tjh) ,  i  =  1,  ■  ■  •  ,  62,  j  =  1,  •  •  •  ,  M, 

where  Tj  is  the  j-th  cluster,  T  =  U^^Tj,  and  M  is  the 
number  of  total  clusters,  and  i  ^Tj. 


Table  1.  Spectrum  distribution  in  clusters  after  partitioning 


coefficient  is  equal  to  one.  We  can  see  that  targets  te  and 
contribute  most  significantly  for  the  high  value  pi  attains.  We 
call  {tg,  ty}  forbidden  pair  of  ty  since  they  together  cause 
a  high  canonical  correlation  with  ty.  This  also  suggests  that 
pI  can  be  decreased  by  breaking  up  the  pair  of  tg  and  ty,  i.e., 
putting  tg  and  ty  into  different  clusters.  We  continue  finding 
such  forbidden  pairs  for  ty  until  pi  is  below  a  given  threshold 
when  all  such  pairs  are  split  up. 

After  we  find  all  forbidden  pairs  for  each  spectrum  in  the 
library,  the  next  step  is  the  partitioning  of  the  library  into  a 
number  of  clusters  such  that  the  two  elements  of  any  forbid¬ 
den  pair  are  assigned  into  different  clusters.  This  can  be  con¬ 
verted  into  a  vertex  coloring  problem  in  graph  theory. 

In  vertex  coloring  problem,  different  colors  are  assigned 
to  the  vertices  of  the  graph  such  that  no  two  adjacent  vertices 
are  assigned  the  same  color.  In  graph  theory,  adjacent  refers 
to  vertices  sharing  the  same  edge.  In  our  case,  we  consider 
those  forbidden  pairs  as  adjacent  vertices,  and  want  the  num¬ 
ber  of  colors  to  be  as  small  as  possible  while  satisfying  the 
given  constraints. 

Graph  coloring  for  an  arbitrary  graph  is  an  NP-hard  prob¬ 
lem  and  has  been  well  studied.  A  number  of  approximation 
and  exact  algorithms  have  been  proposed.  In  our  problem, 
library  partitioning  is  a  one-time  procedure  as  long  as  the  li¬ 
brary  does  not  change,  hence  an  exact  coloring  algorithm  is 
desirable  and  affordable.  We  implement  an  implicit  enumer¬ 
ation  algorithm  using  backtracking  method  [3]. 

The  results  of  library  partitioning  are  shown  in  Table  1, 
where  each  row  corresponds  to  a  cluster  with  its  elements. 
Note  that  the  goal  of  library  partitioning  is  to  reduce  the  canon¬ 
ical  correlation  with  linear  combinations  of  spectra.  The  sin¬ 
gle  correlation  (spectrum-to-spectrum)  values  are  fixed  and 
can  not  be  decreased  by  any  means.  In  our  implementation, 
the  spectrum  pairs  whose  correlations  are  greater  than  the 
threshold  are  determined  first,  and  one  of  the  spectmm  in  each 
high  correlated  pair  is  extracted  from  the  library  before  par¬ 
titioning.  In  this  report,  13  spectra  are  pulled  out  from  the 
library.  The  information  of  extracted  spectra  is  stored  in  a  list 
of  pairs.  Whenever  a  spectrum  in  the  list  is  detected,  a  sec¬ 
ond  stage  classification  is  performed  to  identify  the  present 
chemical  between  this  spectrum  and  its  counterpart. 

To  evaluate  the  condition  of  partitioned  library,  we  need 
to  calculate  both  the  canonical  correlations  of  each  spectrum 
within  its  cluster  and  between  the  other  clusters  after  parti¬ 
tioning. 

The  intra-cluster  canonical  corTelation  for  spectmm  t *  is 
defined  as 

^mtra  _  corr  ,  *  =  1,  ■  ■  ■  ,  62, 

where  denotes  the  cluster  to  which  tj  belongs. 
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Fig.  1.  Canonical  correlations  of  each  spectrum  in  the  library 
before  partitioning 

Fig.  1  shows  both  inter-  and  intra-cluster  canonical  corre¬ 
lations  for  each  spectrum  in  the  library  after  partitioning,  as 
well  as  intra-cluster  canonical  corTelation  before  partitioning 
where  the  whole  library  is  regarded  as  a  cluster.  Since  there 
are  a  total  seven  clusters,  there  is  one  and  six  val¬ 
ues  for  each  spectrum  after  partitioning  in  the  library.  We  can 
see  that  a  lot  of  canonical  correlations  are  close  to  one  be¬ 
fore  partitioning,  and  all  canonical  correlation  values  are  de¬ 
creased  below  the  selected  threshold  of  0.9  after  partitioning, 
thus  decreasing  the  probability  of  false  positives  and  misiden- 
tifications. 

In  DCC-P,  DCC  detector  is  performed  on  all  clusters  of 
the  spectrum  library,  and  the  maximum  DCC  score  is  chosen 
as  the  DCC-P  score. 


4.  SIMULATION  AND  EXPERIMENTAL  RESULTS 
IN  RAMAN  SPECTROSCOPY 

In  simulations,  we  randomly  create  mixing  matrices  of  which 
the  coefficients  follow  a  uniform  distribution  from  [0, 1].  Noise 
is  generated  using  Gaussian  distribution.  The  signal-to-noise- 

ratio  (SNR)  is  defined  as:  SNR  =  10  log]^Q(  where 
II xp  is  the  average  energy  of  the  observation  vector  in  X, 
and  is  the  variance  of  the  noise. 

The  detection  performances  of  DCC,  DCC-P,  NNLS  and 
NNLS-P  are  evaluated  by  the  receiver  operating  characteristic 
(ROC)  curves  shown  in  Fig.  2.  PpA  is  the  probability  of  false 


2119 


0.8 


o 

O) 

T3 

O 


J2 

to 

J2 

O 


0.6 


0.4 


0.2 


- ^ ^ _ _ 


---DCC,  Disc.=0.913 

—  DCC-P,  Disc.=0.995 
NNLS,  Disc.=0.900 

-  -  NNLS-P,  Disc.=0.946 


0.2  0.4  0.6  0.8 

Probability  of  false  alarm,  P 


FA 


Pulse  index 


Fig.  2.  ROCs  for  DCC,  DCC-P,  NNLS,  and  NNLSP  (Present  Fig.  3.  Detection  results  of  a  laboratory  experiment 

chemical=ti,  background=t56,  SNR=  5  dB) 


alarm,  or  1— specificity,  and  Pd  is  the  probability  of  detec¬ 
tion,  or  sensitivity.  The  area  under  the  ROC  curve  measures 
discrimination,  which  is  the  ability  of  the  test  to  make  correct 
decisions.  The  discrimination  values  are  given  in  each  ROC 
plot. 

In  Fig.  2,  we  use  tse  as  the  background,  and  ti  as  the 
target  chemical.  The  SNR  is  5  dB.  For  each  detection  run,  we 
use  a  block  of  2  observations  to  form  X.  Each  curve  is  drawn 
using  200  mns. 

We  can  see  in  Fig.  2  that  DCC  outperforms  NNLS,  and 
library  partitioning  improves  detection  performances  of  both 
the  DCC  and  the  NNLS  detectors.  The  ROCs  and  discrimi¬ 
nation  values  demonstrate  the  effectiveness  of  using  the  max¬ 
imum  canonical  correlation  as  detection  index  in  DCC.  We 
also  calculate  the  coefficients  of  the  canonical  vector  b  for 
each  library  spectrum  in  DCC-P,  and  first  normalize  the  mean 
value  of  each  element  by  its  standard  deviation,  then  divide 
the  vector  by  its  maximum  value.  The  resulting  largest  ele¬ 
ments  in  the  canonical  vector  are  given  by 

b  =  [1.00, 0.34,  ■  ■  •  ,  0.09(9),  •  ■  ■  :  0-09(17)i  '  ’  ’  ]j 

where  the  subscript  denotes  the  index  of  corresponding  ele¬ 
ment  in  the  50-dimensional  b.  The  indices  of  the  largest  co¬ 
efficient  in  b  indicate  that  the  present  chemicals  is  ti,  which 
is  the  correct  identification  in  this  simulation. 

We  also  examine  DCC-P  with  a  total  of  10000  observa¬ 
tions  in  a  laboratory  experiment,  where  chemical  MES  (the 
22-th  spectrum  in  the  library  T)  is  dropped  one  segment  of 
a  asphalt  background.  The  solid  line  in  Fig.  3(a)  is  a  thresh¬ 
old  calculated  from  estimated  background  samples  for  each 
block  of  500  observations.  A  block-size  of  10  observations  is 
used  for  DCC-P  detector.  Since  the  present  positions  of  MES 
are  unknown,  we  are  not  able  to  obtain  a  ROC  curve.  The 
obvious  periodic  pattern  in  Fig.  3(a),  however,  implies  a  suc¬ 
cessful  detection  of  DCC-P  since  the  chemical  is  on  a  rotating 


platform,  and  the  correct  identification  rate  is  satisfactory  in 
Fig.  3(b). 


5.  CONCLUSION 

In  this  paper,  we  propose  a  data-driven  detection  method  for 
subset  target  detection  based  on  a  linear  mixture  model.  By 
investigating  canonical  correlations  and  vectors  between  the 
mixtures  and  the  target  library,  we  can  both  detect  and  iden¬ 
tify  the  present  components  from  a  given  target  library.  The 
detector  is  incorporated  with  library  partitioning  to  improve 
the  detection  performance.  Additional  improvement  can  be 
obtained  by  imposing  a  non-negativity  constraint  on  CCA  in 
applications  such  as  Raman  spectroscopy  and  image  process¬ 
ing  where  contributions  of  mixing  components  can  only  be 
non-negative  [4].  Both  simulation  and  experimental  results 
in  Raman  spectroscopy  demonstrate  the  effectiveness  of  the 
proposed  method. 
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